Kannada named entity recognition and classification (nerc) based on multinomial naïve bayes (mnb) classifier
نویسندگان
چکیده
Named Entity Recognition and Classification (NERC) is a process of identification of proper nouns in the text and classification of those nouns into certain predefined categories like person name, location, organization, date, and time etc. NERC in Kannada is an essential and challenging task. The aim of this work is to develop a novel model for NERC, based on Multinomial Naïve Bayes (MNB) Classifier. The Methodology adopted in this paper is based on feature extraction of training corpus, by using term frequency, inverse document frequency and fitting them to a tf-idf-vectorizer. The paper discusses the various issues in developing the proposed model. The details of implementation and performance evaluation are discussed. The experiments are conducted on a training corpus of size 95,170 tokens and test corpus of 5,000 tokens. It is observed that the model works with Precision, Recall and F1-measure of 83%, 79% and 81% respectively.
منابع مشابه
Mapping Arabic Wikipedia into the Named Entities Taxonomy
This paper describes a comprehensive set of experiments conducted in order to classify Arabic Wikipedia articles into predefined sets of Named Entity classes. We tackle using four different classifiers, namely: Naïve Bayes, Multinomial Naïve Bayes, Support Vector Machines, and Stochastic Gradient Descent. We report on several aspects related to classification models in the sense of feature repr...
متن کاملNamed Entity Recognition and Classification in Kannada Language
Named Entity Recognition and classification (NERC) is an essential and challenging task in (NLP). Kannada is a highly inflectional and agglutinating language providing one of the richest and most challenging sets of linguistic and statistical features resulting in long and complex word forms, which is large in number. It is primarily a suffixing Language and inflected word starts with a root an...
متن کاملPersonality Traits Recognition on Social Network - Facebook
For the natural and social interaction it is necessary to understand human behavior. Personality is one of the fundamental aspects, by which we can understand behavioral dispositions. It is evident that there is a strong correlation between users’ personality and the way they behave on online social network (e.g., Facebook). This paper presents automatic recognition of Big-5 personality traits ...
متن کاملNaive Bayes for Text Classification with Unbalanced Classes
Abstract. Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations [1, 2]. In this paper we present another transformation that is designed to combat a potential problem with ...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1509.04385 شماره
صفحات -
تاریخ انتشار 2015